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A method to accelerate the matrix- vector products of j-scheme nuclear Shell-Model Configuration 
Interaction (SMCI) calculations is presented. The method takes advantage of the matrix product 
form of the j-scheme proton-neutron Hamiltonian matrix. It is shown that the method can speed up 
unrestricted large-scale pf-shell calculations by up to two orders of magnitude compared to previously 
existing related j-scheme method. The new method allows unrestricted SMCI calculations up to 
j-scheme dimension 10 10 to be made in more complex model spaces. 
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Introduction: The nuclear Shell-Model is the most general microscopic nuclear model and is in principle able to 
describe all properties of nuclei. Nuclear Shell-Model Configuration Interaction (SMCI) calculations in large and 
realistic single-particle (s.p.) model spaces are, however, very difficult to make due to the extremely large Hilbert 
. space dimensions involved. The continuous increase in computing power has made it possible to make progressively 
larger nuclear SMCI calculations in restricted model spaces. Currently existing nuclear SMCI methods/programs make 
it possible to calculate nuclear wavefunctions exactly in the model spaces sd and pf Q and in pf 5/259/2 model space 
with somewhat truncated calculations. In larger model spaces drastic truncations (i.e. selection of allowed particle 
configurations) have to be made. Two basic problems arise in large-scale SMCI calculations: Large dimensions require 
^ ' huge amounts of storage space for the calculated states, and the number of non-zero Hamiltonian matrix elements 
becomes prohibitively large (10 14 — 10 16 ) even though the Hamiltonian matrix stays very sparse. Therefore both 
available memory and computational speed often become inadequate for the task at hand. 

Using very large computing resources to solve SMCI problems is the brute force approach to circumvent these 
problems. Alternatives to it are to develop mathematical methods to truncate large calculations and to keep only 
\Q , physically relevant degrees of freedom in the wavefunctions, or to make the most time consuming parts of SMCI 
calculations more efficiently. Various truncation methods have been developed during the last twenty years, for 
example the methods that produce exponential convergence of observables as a function of truncation A very 

+- j" > ' good example of the methods that make SMCI calculations more efficient in terms of computation resource usage is 
being used in the SMCI code nathan and its cousin, the code antoine 0, 0] • These codes use a novel Hamiltonian 
matrix compression method where the SMCI proton-neutron Hamiltonian matrix elements are not stored before a 
SMCI calculation, but constructed during each matrix-vector operation of a Lanczos procedure using precalculated 
Reduced Density Matrix (RDM) elements. This innovation made it possible to make the first unrestricted SMCI 
• rH , calculations in the pf-shell where other SMCI programs using older methods could not work without truncations. 
For example in the nucleus 56 Ni in the pf-shell this method avoids the explicit storage of 10 14 non-zero Hamiltonian 
matrix elements. 

The most basic SMCI method is the m-scheme method |(| that uses bare Slater determinants of spherically sym- 
metric s.p. orbit configurations as its many-body basis states. The basic problem with the m-scheme SMCI is that 
the Slater determinant basis dimension is maximal and therefore a lot of storage (from gigabytes to tens of gigabytes) 
is needed for each calculated Lanczos basis vector in large-scale calculations. A common method to reduce the large 
matrix dimensions of the m-scheme SMCI is to use the existing symmetries of the nuclear Hamiltonian. The j-scheme 
SMCI method uses angular momentum projected many-body basis states, but does not have good isospin, and is 
used in the SMCI code nathan. Compared to the m-scheme the j-scheme typically reduces the SMCI dimensions by 
two orders of magnitude for low-spin states and less for high-spin states. This property makes it most suitable for 
low-spin states, such as the ground states of double-even nuclei. The j-scheme method of matrix-vector products 
is further developed here to make it numerically more efficient and more suitable for very large SMCI calculations. 

Theoretical overview: The j-scheme method forms the SMCI many-body basis states using angular momentum 
coupled products of proton and neutron many-body basis states. Lanczos vectors \vjm) are expanded as 
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where j p and j n are proton and neutron total angular momenta and the quantum numbers a p and a n sum over all 
possible combinations of linearly independent states. Because of the angular momentum coupled structure of basis 
states, the full Hamiltonian matrix can be naturally divided into matrix blocks that contain all allowed bra and ket 
particle configurations but where the proton and neutron bra and ket angular momenta are kept fixed. One such 
Hamiltonian matrix block will be concentrated on here. In addition, only the proton-neutron part of the nuclear 
two-body Hamiltonian is considered, whose treatment in the proton-neutron formalism is the most time consuming 
part of a j-scheme SMCI calculation. 

Using the angular momentum coupled states of (1), the matrix elements of the nuclear proton-neutron Hamiltonian 
are sums of the products of proton one-body RDM elements, neutron RDM elements, angular momentum recoupling 
factors and transformed two-body interaction matrix elements: 
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as originally shown by French El. F x (aa'bb') are particle-hole transformed two-body interaction matrix elements and 
r A is an angular momentum recoupling factor (see Q for more details). Eq. (2) shows the principal idea of 
Create row and column indices for proton and neutron RDM elements in such a way that the indices of the full 
Hamiltonian matrix can be obtained just by summing together the proton and neutron indices. In this way a small 
amount of RDM elements can generate a large number of Hamiltonian matrix elements. The method of Eq. (2) will 
now be extended. To simplify the subsequent formulae the d\ x d[ neutron RDM is written as 
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where j £ [1, di] and j' 6 [1, d[] and the di X d' 2 proton RDM as 
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where i 6 [1,^2] and i' € [l,rf 2 ]- Inside the RDMs A and B each set of quantum numbers [a n ] has a unique row or 
column index i. The density operator indices label sets of s.p. quantum numbers, a = {n a ,l a , j a }, b — {nb,lb,jb}- 
Since each density operator only connects one bra particle configuration to one ket configuration, the matrices A and 
B are sparse supermatrices that consist of dense blocks. Furthermore, a matrix A aa x that corresponds to a certain 
density operator [c£c a /] , has only one dense matrix block on each supermatrix block row. 

A Lanczos basis vector \vjm) and the result of a Hamiltonian matrix- vector product \ujm) = H\vjm) can be 
ordered so that their basis state amplitudes can be expressed in terms of matrices: 
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and similarly for the vector \ujm) where the constant quantum numbers have been omitted for simplicity. Using Eqs. 
(2-5), the Hamiltonian matrix of Eq. (2) can be transformed to sums over direct products of proton and neutron 
density matrices and the matrix- vector product can be expressed as a sum of triple matrix products: 
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In this equation the indices i' and f are implicitly summed over. Alternatively, one may change the order of the 
matrix products and use the equation 

U * = E A^V ri ,{^2^F x {aa'bb')B^^) , (6) 

Xaa' bb' 

if it uses less sum and multiplication operations. Usually for N v =/= Z v nuclei the density matrices corresponding to 
smaller number of valence particles/holes are smaller and should be used in the innermost loop. Eqs. (6) and (7) can 
still be further optimised. Because the same density matrix A aa A is used for more than one Hamiltonian block, Eq. 
(6) can be converted to the form 

u # = E(E^( aaW )^' A ) E 1 ^'^- (7) 

Xbb' aa' j' n 

where the quantum number j' n of a neutron many-body basis state goes through all neutron angular momenta 
quantum numbers that can be coupled with fixed proton ket angular momentum quantum number j' p to a total 
angular momentum J. In this way multiple right-hand side matrices V J ™ can be multiplied with their corresponding 
density matrices B b& Aj ™ and summed together before performing the multiplications with the blocks of matrix A aa x . 
Eq. (8) reduces to Eq. (6) for J = states, but reduces floating point operations for states with higher angular 
momentum by approximately 30 — 50%. It is however quite complex to implement and therefore has not yet been 
used in this work. Note: The form of Eqs. (6-7) makes them ideal for separable interactions, such as the pairing plus 
quadrupole interaction = —GP^P — |xEu < 3jL < 3 V or tne center-of-mass interaction p, because the resulting 
separability of the two-body interaction matrix elements, F\(aa'bb') = f\(aa')f\(bb'), allows the full matrices A and 
B in Eqs. (6-8) to be used totally independently of each other. 

Discussion: The results of unrestricted benchmark calculations made in the pf-shell model space for nuclei from 
44 Ti to 56 Ni using an implementation of this method in the SMCI code eicode |l(J, [llj are presented in Fig. 1. All 
results are calculated using one AMD Opteron 2.4 GHz processor. It can be seen that the number of mathematical 
operations per one Hamiltonian matrix-vector product scales as d 1A6 for angular momenta J = 0, 2 where d is the 
SMCI matrix dimension. The number of mathematical operations for higher angular momenta (which have exactly 
the same scaling) have not been included for clarity. In this work the best implementation of the original j-scheme 
matrix vector product method of 0,0 scales as d 1S2 as a function of basis dimension (Fig. 1, dashed line). Whereas 
calculations for all angular momentum values scale exactly similarly in the case of the old method, the new method 
has slightly different constant multiplicative factors that depend on angular momentum and the isospin z-componcnt. 
The dependence on angular momentum is roughly 2 J + 1 and can be removed to a large extent by using Eq. (8) 
for matrix- vector products instead of Eqs. (6-7). Both the old method and the method of Eqs. (6-7) use the largest 
number of floating point operations for nuclei with N v = Z v , and therefore the wavefunctions of these nuclei are the 
most time consuming ones to calculate. 

The more favourable scaling of floating point operations in this modified method reduces them significantly for very 
large calculations. In the case of + states of 56 Ni the reduction of floating point operations is 35-fold. Considering the 
actual matrix-vector product times, the time plot in Fig. 1 inset shows two different regimes. In the low-dimensional 
cases matrix products are very fast, initialisation overheads dominate the calculation time, and therefore only the 
scaling of matrix-vector operation times in the higher-dimensional regime from 50 Mn onwards are of interest. In this 
regime the matrix-vector operation times scale as n 128 . This scaling can be compared against the n scaling of the 
m-scheme code antoine, shown on page 445 of where the time for 15 Lanczos iterations for + states of the pf-shell 
nucleus 52 Fe is roughly 10 4 seconds and roughly 10 5 seconds for 56 Ni (extrapolated), giving average matrix- vector 
multiplication times of 670 and 6700 seconds using a modern microprocessor comparable to the one used in this work. 
For the same two calculations the new j-scheme method that uses Eqs. (6) and (7) results with one Hamiltonian 
matrix-vector multiplication taking respectively 83 and 1317s. The code antoine is therefore five times slower than 
the new j-scheme method of Eqs. (6-7) for 56 Ni. Compared to the original j-scheme method of used in eicode, 

the new method shortens the calculation time 75-fold for this nucleus. 

The reason why the new method is faster than what the reduction of the number of mathematical operations 
alone in Fig. 1 suggests, is that the matrix products of Eqs. (6-7) can be implemented very efficiently on a modern 
computer using fast matrix product kernel routines For large SMCI calculations, where the RDM blocks have 
large dimensions, the new method can use up to 70% of a processor's theoretical peak floating point capacity in 
the pf-shell model space and even slightly more in larger s.p. model spaces. In contrast, the efficiency of the best 
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FIG. 1: (Color online) Number of mathematical operations needed for one matrix- vector product as a function of basis 
dimension for various N v = Z v and N v = Z v + 2 pf-shell nuclei from 44 Ti to 60 Zn. The lines are fit functions, shown in the 
legend box. Inset: Actual time per one matrix vector operation for + states of N v = Z v pf-shell nuclei using the new method. 
The solid line is a fitted function F = 8.28 • W~ 7 d 128 . 




FIG. 2: (Color online) Same as Fig. 1, but for unrestricted calculations for N = Z and N = Z + 2 nuclei in the sdgj /2/111/2 
model space using the method of Eqs. (6-7). 



implementation of the original j-scheme method of [lj in the code eicode is always low, roughly 10 — 15% of the 
processor's peak capacity. Looking at the calculation time plot of [j| it is suspected that this is also the case for 
the codes antoine and nathan. The reasons for this inefficiency are the explicit calculation of matrix indices and the 
explicit construction of every Hamiltonian matrix element, which are avoided in the new method. 

Conclusions: It has been shown that the numerical effort needed to use the optimised j-scheme matrix-vector 
products of Eqs. (6-7) scales more favourably than the old j-scheme method, used in the SMCI codes nathan and 
eicode, as a function of basis dimension, making the new method up to two orders of magnitude faster in the pf-shell. 
The same speedup and scaling of floating point operations also happens in the sdg7/ 2 hn/2 model space used for 
nuclei above 100 Sn. This can be seen from Fig. 2, where the scaling of floating point operations has been plotted for 
various N v , Z v = 3 — 8 nuclei in this model space. The increased efficiency makes the method competitive against the 
m-scheme SMCI method of 0, Q that is used in most modern m-scheme SMCI codes. 

With the j-scheme matrix vector product method shown here it is possible to make unrestricted SMCI calculations 
for all nuclei in the pf model space and for most nuclei in the pf 5 / 2 gg/2 model space using modest computing resources. 
For more complex model spaces more computational power will be necessary, but calculations still stay tractable, since 
j-scheme dimensions of the order of 2 • 10 9 can be handled easily. Table 1 shows estimated matrix-vector product 
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TABLE I: M-scheme and j-scheme dimensions, number of floating point operations and estimated computation time with one 
CPU for various unrestricted SMCI calculations for the + states of nuclei in the sdg 7 / 2 hii/2 model space. N v and Z v are the 
number of valence neutrons and protons. 



(N V ,Z V ) 


Dim (m) 


Dim (j) 


F 


Ti[b] 


(6,4) 


8.56 • 10 8 


7.24 • 10 6 


1.38 ■ 10 12 


550 


(5,5) 


1.06 ■ 10 9 


8.92 ■ 10 6 


1.73 ■ 10 12 


870 


(7,5) 


1.65 ■ 10 10 


1.21 ■ 10 8 


8.45 ■ 10 13 


2.9 ■ 10 4 


(6,6) 


2.00 ■ 10 10 


1.45 • 10 8 


1.02 ■ 10 14 


3.4 • 10 4 


(8,6) 


2.20 ■ 10 11 


1.43 • 10 9 


3.11 ■ 10 15 


1.03 • 10 6 


(8,8) 


2.43 ■ 10 12 


1.50 • 10 9 


3.08 ■ 10 15 


1.02 ■ 10 6 



times for various nuclei in the sdg^^hn^ model space for up to eight valence protons and neutron holes. For up to 10 
valence particles+holes the method of Eqs. (6-7) can be used without large computational resources. The calculations 
with more than 10 valence particles need larger computational resources. Using the original j-scheme method of 00], 
at least two orders of magnitude more resources would have to be used to obtain results equally quickly for large-scale 
calculations. The m-scheme SMCI methods also encounter their dimensionality problem with these nuclei. 

Dimensions of the same order of magnitude as in Table 1 will also be encountered in the p/5/257/2^5/2 model space 
that is used for the description of deformed nuclei in the N w Z = 40 region and with the No-Core Shell-Model 
(NCSM) [l3| used for the ab initio description of the structure of light nuclei. The method described here makes 
it possible to do j-scheme SMCI calculations up to a dimension of 10 10 with current computing hardware and is a 
step towards unrestricted calculations in the aforementioned model spaces. A further increase of SMCI calculation 
dimensions can be sought for by combining this method with the advanced truncation methods that show exponential 
convergence, such as the methods of Refs. 0,0,0] 
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